Introduction

We have information about the customers’ ZIP code. This information could be used, with public available information from sources like INEGI, to know the socioeconomic level of each savings customer.

Available sources:

AGEB stands for Área GeoEstadística Básica (Basic Geostatistical Area), and a locality is a general term used by CONAPO to define several AGEBs.

This document uses information from the socioeconomic regions defined by INEGI.

ZIP code geographical information is available. According to the official postal code webpage, there are 32,448 different ZIP codes in Mexico, from which 14,871 are available as shape files.

Problem:

The polygons defining the ZIP codes aren’t equivalent to the polygons defining the AGEBs, so a mapping between them is needed to be able to use the public available information.

Possible solutions:

  1. Mapping from centroid to centroid
  2. Polygon convex combination

First approach: mapping from centroid to centroid

Perhaps the simplest solution is to find the centroid of each ZIP code and AGEB, and then just map a given ZIP code to the closest AGEB centroid.

We have a classification for each AGEB that pretends to show the differences among AGEBs based on indicators related with housing, education, health and employment, built from the last population census. Each AGEB can be classified in 7 strata such that stratum 7 contains AGEBs with the most favorable average conditions, and in stratum 1 are the AGEBs with the least favorable average conditions.

In the next images, maps of Mexico City and surroundings, Monterrey and Guadalajara are shown.

Map with centroids of each polygon:

Now, same map for Guadalajara, Jalisco:

And finally, for Monterrey, Nuevo León:

ZIP code information with their centroids can be seen in the next map of Mexico City:

ZIP code information with their centroids can be seen in the next map of Guadalajara. Some of the centroids may not match perfectly the polygon plotted because the database considers a the ZIP code and the identifier as a different group.

ZIP code information with their centroids can be seen in the next map of Monterrey:

Finally, plotting the centroids of AGEBs and ZIP codes in Mexico City altogether we get:

Guadalajara:

Monterrey:

So, for each available ZIP code, the closest AGEB centroid is found and a mapping is made to assign an AGEB to each ZIP code, such that we get a table in the following format:

ZIP ZIP long ZIP lat Nearest AGEB AGEB long AGEB lat Distance in Km Classification
1000 -99.19328 19.34674 0901000011129 -99.19294 19.34740 0.0813726 7
1010 -99.19391 19.36064 0901000010972 -99.19487 19.36071 0.1007520 7
1020 -99.18719 19.35735 0901000010987 -99.18858 19.36013 0.3419371 7
1030 -99.17933 19.35734 0901000011063 -99.17872 19.35448 0.3237404 7
1040 -99.19279 19.35596 0901000011044 -99.19429 19.35471 0.2092340 7
1048 -99.20502 19.36202 090100001092A -99.20353 19.36357 0.2330690 6
1049 -99.19715 19.35357 0901000011044 -99.19429 19.35471 0.3255840 7
1050 -99.18253 19.34970 0901000011133 -99.18462 19.34641 0.4264119 7
1060 -99.19831 19.34950 0901000011114 -99.19524 19.35028 0.3327968 7
1070 -99.18654 19.34449 0901000011133 -99.18462 19.34641 0.2946388 7

This approach may fail since, as one can see, ZIP code polygons are generally bigger in area than AGEBs, so the heterogeneity of each ZIP code is being ignored.

Second approach: pending

Customer analysis

First, let’s see what’s the distribution of the classification of AGEBs in the country. Remember that 7 is that the AGEB is “good” in average and that 1 is that it’s “bad”.

And now, the mapping of the ZIP codes:

The distribution changed drastically. As we can see in the following graph, originally the AGEBs were urban (U) and rural (R), but the mapping consists of only urban ZIP codes; so this may be a reason of why the distribution changed so much.

And now let’s analyze the sample with 1 million savings customers.

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Joining by: "CP"

Out of the 1 million people, we have the mapping ZIP code for ‘r sum(datos\(zip_code %in% mapeo\)CP)’ of them, which are distributed the following way:

## Warning in eval(substitute(expr), envir, enclos): NAs introduced by
## coercion
## Joining by: "CP"